We Tweet Like We Talk and Other Interesting Observations: An Analysis of English Communication Modalities
نویسنده
چکیده
Modalities of communication for human beings are gradually increasing in number with the advent of new forms of technology. Many human beings can readily transition between these different forms of communication with little or no effort, which brings about the question: How similar are these different communication modalities? To understand technology’s influence on English communication, four different corpora were analyzed and compared: Writing from Books using the 1-grams database from the Google Books project, Twitter, IRC Chat, and transcribed Talking. Multiword confusion matrices revealed that Talking has the most similarity when compared to the other modes of communication, while 1-grams were the least similar form of communication analyzed. Based on the analysis of word usage, word usage frequency distributions, and word class usage, among other things, Talking is also the most similar to Twitter and IRC Chat. This suggests that communicating using Twitter and IRC Chat evolved from Talking rather than Writing. When we communicate online, even though we are writing, we do not Tweet or Chat how we write books; we Tweet and Chat how we Speak. Nonfiction and Fiction writing were clearly differentiable from our analysis with Twitter and Chat being much more similar to Fiction than Nonfiction writing. These hypotheses were then tested using author and journalists Cory Doctorow. Mr. Doctorow’s Writing, Twitter usage, and Talking were all found to have very similar vocabulary usage patterns as the amalgamized populations, as long as the writing was Fiction. However, Mr. Doctorow’s Nonfiction writing is different from 1-grams and other collected Nonfiction writings. This data could perhaps be used to create more entertaining works of Nonfiction. INTRODUCTION The evolution of new dialects of spoken languages has often been studied and modeled with respect to the contribution of idiolects (1) and the impact of spatial pressures (2). With technology allowing ease of communication between spatially distant individuals, there is the possibility that modern dialects may form between members of social networks without similar geographical locations. This raises a somewhat semantic question: What constitutes a dialect? It is clear that graphemic communication can be very different from phonemic communication, as these have been suggested to be different dialects in English (3). However, overall this question is complicated and overlooks a simpler question: How similar or different are communication corpora from each other? In the past 25-30 years modes of online communication, such as Chat, appear as a combination of phonemic and graphemic dialects, but it is unclear how much these modes differ. Modern modalities of communication were developed on the basis of previous forms of communication, but it is unclear if the language usage is more similar to written or spoken “dialects”. In this study, I have analyzed the vocabulary from four different bodies of work: books from the Google Books project (4), chat logs from IRC, transcribed verbal communication (i.e., Talking), and over one million unique messages sent on Twitter. What I find is unexpected. Though Chat and Twitter could be considered forms of written communication, the word usage patterns are more similar to Talking. Comparisons between Fiction and Nonfiction writing show that Fiction more closely resembles Chat and Twitter. Using a case study of Author, Speaker and Twitter user Cory Doctorow I find that the way someone speaks can be as similar to Nonfiction writing as it is to Fiction writing but we incorrigibly Tweet like we speak.
منابع مشابه
A Multi-Layered Discourse Analysis of Students’ Classroom Talk in Two Contexts: Rural vs. Urban
This study aimed at discussing and representing discourse analysis of classroom talk in two contexts. It is significant, since it considers different genres of talk, cultural and social identities, social relations, different ideologies and many other aspects in this analysis. It attempts to analyze the dominant classroom patterns in two contexts. Two cases of study were analyzed in this study:...
متن کاملForecasting Stock Price Movements Based on Opinion Mining and Sentiment Analysis: An Application of Support Vector Machine and Twitter Data
Today, social networks are fast and dynamic communication intermediaries that are a vital business tool. This study aims at examining the views of those involved with Facebook stocks so that we can summarize their views to predict the general behavior of this stock and collectively consider possible Facebook stock price movements, and create a more accurate pattern compared to previous patterns...
متن کاملBook Review: 'Ecolinguistics: Language and ecology'
Ecolinguistics: language and ecology delivers an overall view and a critical approach on ecolinguistic studies. This book is an excellent resource to students, researchers, linguists and those working in the area of discourse analysis as well as ecology. The book claims presenting a news course for ecolinguistics including a framework for understanding the theory of ecolinguistics, exploration ...
متن کاملConfidence Interval Estimation of the Mean of Stationary Stochastic Processes: a Comparison of Batch Means and Weighted Batch Means Approach (TECHNICAL NOTE)
Suppose that we have one run of n observations of a stochastic process by means of computer simulation and would like to construct a condifence interval for the steady-state mean of the process. Seeking for independent observations, so that the classical statistical methods could be applied, we can divide the n observations into k batches of length m (n= k.m) or alternatively, transform the cor...
متن کاملAn architecture for Malay Tweet normalization
Research in natural language processing has increasingly focused on normalizing Twitter messages. Currently, while different well-defined approaches have been proposed for the English language, the problem remains far from being solved for other languages, such as Malay. Thus, in this paper, we propose an approach to normalize the Malay Twitter messages based on corpus-driven analysis. An archi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1403.0531 شماره
صفحات -
تاریخ انتشار 2014